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Learning the network structure of a large graph is computation- 
ally demanding, and dynamically monitoring the network over time 
for any changes in structure threatens to be more challenging still. 

This paper presents a two-stage method for anomaly detection 
in dynamic graphs: the first stage uses simple, conjugate Bayesian 
models for discrete time counting processes to track the pairwise links 
of all nodes in the graph to assess normality of behavior; the second 
stage applies standard network inference tools on a greatly reduced 
subset of potentially anomalous nodes. The utility of the method is 
demonstrated on simulated and real data sets. 



1. Introduction. Anomaly detection on graphs of social or communica- 
tion networks has important security applications. The definition of a graph 
anomaly typically depends on the data and application of interest. Typically 
anomaly detection focuses on the connections among the graph's entities and 
various methods have been developed for their analysis. Examples include 
spectral decompositions [an area excellently summarized in von Luxburg 
(2007)], scan statistics [Priebe et al. (2005)] and random walks [Pan et al. 
(2004), Tong, Faloutsos and Pan (2006)]. These methods are generally com- 
putationally demanding when applied to very large networks; also, in decid- 
ing upon which one to use, an explicit choice is being made on the type of 
anomaly sought. The interest of this paper is anomaly detection in large dy- 
namic networks, in a context where in principle any type of anomaly should 
be detected. We focus on problems relating to anomalies in social networks, 
and present analyses of real and simulated data from this area. In each case, 
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the network is observed over a sequence of discrete times, where each obser- 
vation provides only a partial view of the full connectivity; a complete view 
of the network is provided by the time series as a whole. 

The real data come from the European Commission Joint Research Cen- 
tre's (JRC) European Media Monitor (EMM) (http://emm.jrc.it). EMM 
is a web intelligence service, providing real-time press and media summaries 
to Commission cabinets and services, including a breaking news and alerting 
service. This service requires JRC to parse each of the news documents to 
extract the relevant information and tag the story as belonging to a par- 
ticular topic. For our analysis, JRC provided 131 weeks of Media Monitor 
data sourced from a collection of approved websites, starting from 1st Jan- 
uary 2005, although this period includes a known two-week server downtime 
at the end of the first month. The data were extracted from news articles 
tagged as being related to terrorist attacks, political unrest and security. 
The data we receive are undirected and in a simple list format showing the 
date of a reported link and the names of the two individuals involved. 

The simulated data come from the VAST Challenge 2008 (http: / /www.cs. 
umd.edu/hcil/VASTchallenge08); we consider the simulated cell phone data 
from the Mini Challenge focused in the area of social network analysis. The 
cell phone call records cover a fictional ten-day period on an island, narrowed 
to 400 unique cell phones during this period. As well as the time of each 
phone call and details of who phoned whom, an identifier of the cell tower 
from which the call originated is also given. The records should provide 
critical information about an important social network structure. From the 
results of award winning published work on this challenge [Ye et al. (2008)] , 
work which used a combination of PageRank [Brin and Page (1998)] and 
visual analytic methods, there is good reason to suspect that the major 
anomalous activity occurs on the eighth day and involves a list of at least 
eleven individuals. 

2. Two-stage approach. The idea behind the method presented here is a 
simple one: If a social network has fundamentally changed in some important 
way, then in most contexts this is likely to suggest that there are some 
individuals who are now either communicating more or less frequently than 
usual, or communicating with different individuals than usual. Beyond this 
view, there may well be much more subtle network structure to examine, 
but initially taking this more simple view allows good targets to be quickly 
identified, with the important possibility to then zoom-in and investigate 
such local structure. 

In this paper we present a two-stage approach to dynamic anomaly de- 
tection. The first stage is a sweep of the database to identify potentially 
anomalous nodes in the network; in the second stage, a subgraph is con- 
structed around this set of nodes, usually extended to include other nodes 
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which have recently (or, perhaps, ever) communicated with a node in this 
set, and then standard network analytic tools are used to investigate struc- 
ture in this vastly reduced subnetwork. 

Technically, for each pair of individuals we independently model the com- 
munications between them over time as a counting process, with the incre- 
ments of the process following a Bayesian probability model. At any point 
in time, we test whether their relationship has changed to a degree that is 
statistically significant. If the derived predictive p- value falls below a fixed 
threshold, this represents a departure from previously modeled behavior. 
The node pair are then said to be anomalous and are added to the set of 
anomalous nodes for this period. Such an approach is statistically princi- 
pled and computationally very simple. By assuming independence of the 
processes, the method is also fully parallelizable, in the sense that each node 
pair is examined in isolation. This assumption of independence will be ap- 
proximately acceptable only in some circumstances, and a method which 
seeks to relax this assumption is considered in Section 3.4. 

Once a reduced subset of interesting nodes has been identified, standard 
network tools such as spectral clustering can be much more readily deployed; 
also, at this stage we are now interested in the simpler problem of character- 
izing structure, such as identifying clusters, rather than looking for changes 
in this structure, the latter being a task which requires additional metrics 
to be specified. 

The threshold at which p-values are judged to be significant must be set 
by the user. In this paper we use a 0.05 threshold, but smaller or larger 
critical values would lead to correspondingly smaller or larger networks of 
potentially anomalous nodes. In practice, a good threshold can be chosen to 
be as large as possible subject to the resulting anomaly network being of a 
manageable size such that follow-up investigation is feasible. 

3. Discrete time counting process models. The number of communica- 
tions over time are treated as simple Bayesian discrete time counting pro- 
cesses with conditionally independent increments. For each period in time, 
the number of communications between individuals will represent the cur- 
rent weight of their association in the network. 

We first consider some different ways of counting up the communications. 
Then, simple Bayesian probability models are given for learning about such 
counting processes. Full details of these probability models and the param- 
eterizations used are given in Supplement A [Heard et al. (2010)]. 

3.1. Pairwise, individual and total activity analysis. For each pair of in- 
dividuals starting from time when the data collection process begins, 
let Nij(t) be the number of communications made from i to j up until dis- 
crete time t; alternatively, for a simpler binary view of the network, let Nij(t) 
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be the number of periods in which i has communicated with j by time t. If 
the graph is undirected, we have the simplification Nij(t) = Nji(t). 

Let Pij be a probability model for the increments dNij(t) = Nij(t) — 
Nij(t — 1) under normal circumstances. In the simplest setting, we can con- 
sider dNij(l),dNij(2), ... as independent realizations from P^; the distribu- 
tion Pij corresponds to the normal mode of communication behavior for this 
pair of individuals. Anomalous behavior at time t, on the other hand, can 
be regarded as a value of dNij(t) drawn from a distribution other than P^. 
The aim is then to detect which values of dNij(t) are not draws from the 
unknown P^. 

For a realized value of dNij(t) = n, we find a two-sided Bayesian p- value 
as the posterior probability of observing a count as extreme as n; this poste- 
rior distribution is a marginal calculation based on our revised beliefs about 
the unknown distribution P^ in light of all other periods of data we have 
observed. Carefully chosen conjugate Bayesian models allow for this infer- 
ential process to be analytically tractable [see Bernardo and Smith (1994) 
for details]. For example, a (simplistic) parametric choice for P^ could be 
Poisson(Ajj) for unknown rate parameter Ajj > 0. Completing the model 
specification with a gamma prior for Ay ensures the posterior predictive 
distribution for a future period is calculable as a simple ratio of Poisson- 
gamma mass functions. Where no obvious parametric form for P^ exists, 
nonparametric Bayesian inference is available via the Dirichlet process [Fer- 
guson (1973)]. 

In the absence of specific prior information about any of the nodes, iden- 
tical prior distributions are adopted for each of the node pair counting pro- 
cesses Nij(t). So in the first observation period, each node pair has the same 
probability of being active and, hence, the implied model on the whole net- 
work belongs to the well-known class of exponential random graph models 
(ERGMs) [Wasserman and Pattison (1996)]. From the second time period 
onward, however, the posterior predictive distributions will differ between 
node pairs according to the activity which has been observed and so here 
we see a departure from ERGMs. 

The framework above can be regarded as an independent, pairwise anal- 
ysis of the members of the network. If dNij(t) is the adjacency of node i 
to node j at time t, then a similar individual-based analysis considers the 
outdegree and indegree of node i, given by the respective increments of 
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These two summed processes correspond to the number of outgoing [Equa- 
tion (3.1)] and incoming [Equation (3.2)] communications over time for indi- 
vidual i. For an undirected graph the indegree and outdegree are equivalent 
and Equation (3.2) is redundant. Again, we can assume exchangeable incre- 
ments following similar probability models for these processes and look for 
outlying values in each time period. 

Finally, as a highest level summary, we can monitor the degree sum of the 
network over time, given by the increments of 



where the two definitions correspond to undirected and directed graphs re- 
spectively. Such processes monitor the overall network activity level. Again, 
the same conjugate Bayesian probability models can be applied at this level. 

3.2. Parametric inference and hurdle models. Social network graphs are 
typically sparse [Faloutsos, McCurley and Tomkins (2004)]. Particularly in 
larger networks, most pairs of individuals will not communicate with one 
another, suggesting a vanishing fraction of node pairs actually have an edge 
between them. This sparsity can be seen as providing an analytical advan- 
tage here, as there will be fewer nontrivial node pair relationships in the 
graph. 

However, when the network is viewed temporally, the sparsity of the net- 
work is further increased. As we will see in the examples later, even individ- 
uals who are related will spend much of the time not communicating. This 
type of sparsity is problematic when modeling the counting processes, as the 
large number of time periods showing zero communications mean that stan- 
dard exponential family distributions are inappropriate for modeling normal 
behavior. 

We extend the exponential family probability models to their hurdle vari- 
ants [Mullahy (1986)], which incorporate additional probability variables for 
determining whether or not the node pair are active in a given period t. The 
modeling of the process dNijit) is split into two parts, first a hurdle process 
for determining whether dNij(t) = or dNij(t) > 0, and then second another 
stochastic process governing the value taken by dNij(t) at those times when 
the hurdle process dictates that dNij(t) > 0. 

At time t let Aij (t) be the number of time periods u < t in which d/Vy (u) > 
0, meaning the node pair were active. The increment for time t, dAij(t) = 
Aij{t) — Aij(t — 1) takes value or 1, with dAij(t) = 1 indicating the pair 
were active in time period t. A counting process model with Bernoulli in- 
crements specifies Aij(t). 

For times when the two individuals are active, the hurdle model also 
requires a second model for the increments dNij(t) > 1. We use the shifted 
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quantities dNij(t) — 1 > to define the increments of a second counting 
process dBij(s) by the equations 

dB i:j (s) = dN lj (t s )-l, s = l,2,3,..., 

t s =mm{t:Aij(t) =s}, 

with resulting counting process Bij(s) = YXi=i dBij(u). 

For the hurdle model we therefore need to specify two (typically inde- 
pendent) models for the counting processes Bij(-) and -Ay(-)- Assuming 
independence of -Ay(-) and By(-), the increments of the compensator Ajj(-) 
for the process iVy(-) can be expressed as 

dAij(t) =E[dNij {t)\Ht- X \ =E[dA ij (t)\n t -imdB ij (t)\n t -i] + 1), 

where for N(t) = {Nij(t) :i ± j}, H t = {N{u)\u = 0, 1, 2, . . . , t} is the history 
of the processes up until time t. Then, since 

E[d2Vg(t)|7£t-i] =E[dA ij (t)\H t -i](E[(dB ij (t) + l) 2 |^<_i]) 

= E[dA ij (t)\Ht-i}{v&r[dB ij (t)\Ht-i} + 1} 

+ dA ij (t)E[dB ij (t)\n t -i}, 

it follows that the increments of the predictable variation of the counting 
process martingale My(t) = Nij(t) — Ajj(i) satisfy 

d{Mij{t)) =E[dA ij (t)\Ht-i}{v&r[dB ij (t)\H t -i] + 1} 

+ dk ij {t)^[dB ij {t)\H t ^ l \ - dk%(t). 

These equations can be used for checking how well the models for Nij(t) 
compare in fitting the data. 

3.2.1. Bernoulli process. The hurdle process increments {eL4.y(i)} are 
most simply treated as a Bernoulli process 

dAij(t) ~ Bernoulli (7Tjj), t = 1,2,3, ... , 

where 1 — iiij can now be much greater than the zero count probability 
prescribed by standard exponential family models. Note that this assumes 
independence of the increments. 

3.2.2. Markov chain. To enable simple dependence on the activity status 
in the previous time period, an alternative Markov model instead considers 

(3.3) </nj = PT(dAij(t) = l\dAij(t - 1) = 1), 

(3.4) i/jij = PT(dAij(t) = l\dAij(t - 1) = 0). 

For a comparable marginal probability to 7Tjj in the Bernoulli process 
model, note that the stationary distribution for this Markov chain implies 
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an equilibrium probability for the pair being active \dAij{t) = 1] at any 
particular time t given by 

i + Vty - 4>ij ' 

Model specification for can use standard exponential family distribu- 
tions such as Poisson or geometric; combined with conjugate beta priors for 
the hurdle probabilities above, we retain fully conjugate Bayesian inference 
for Nij(t). 

3.3. Nonparametric inference. If, even with the hurdle extensions, it is 
still unclear what would be a suitably simple parametric model for the num- 
ber of communications, then a useful conjugate, nonparametric Bayesian 
alternative is the Dirichlet process (DP) of Ferguson (1973). Using a base 
measure which is a small positive scalar multiple of, say, a hurdle expo- 
nential family distribution, allows fully coherent but data driven inference 
which is largely reliant on the tail probabilities of the empirical histogram 
of observed counts. 

3.4. Multinomial extensions. For directed graphs, a related approach 
considers using the ideas above to first model the overall counting process of 
activity, say, for an individual i. Then, given a particular number of commu- 
nications involving individual i, we consider categorical modeling of which 
classes of communication they will be. The classes could correspond to other 
individuals in the network, or if the links are labeled with categorical types, 
these classes could be the link types observed. 

Suppose dNi.it) = n, so in the ith time period individual % makes n com- 
munications. Concentrating on whom the communications were with, let 
Pij be the probability that any contact made by individual i will be to 
individual j. Then assuming independence between subsequent communica- 
tions, dNij(t) ~ Binomial(n,pij). More generally, using the vector notation 
dNi-(t) = (dN a (t), . . .,dW i(i _ 1) (t),diV i(i+1) (t), . . .), 

dNi—(t) ~ Multinomial(n, j>j_). 

Standard conjugate Bayesian inference under the multinomial model uses a 
Dirichlet prior for the class probabilities; see Bernardo and Smith (1994). 

For a familiar goodness-of-fit hypothesis test of multinomial data, we 
could contrast the observed counts dNij(t) with the expected npij through 
the familiar likelihood ratio test statistic 
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so performing a x 2 significance test. However, such a test would not incor- 
porate uncertainty in the overall number of communications, as it is based 
conditionally on observing dNi.(t) = n. Hence, we obtain an augmented like- 
lihood ratio test statistic 



■j:dNij{t)>Q 



dNij(t)k>i 



dN l3 {t) 
riE(pij) 



\og{Pi.(dNi.(t)=n)} 



which also takes into account the uncertainty in d/Vj.(t). 

In summary, a probability model for the overall counting process Ni.(t) for 
individual i, along with the multinomial model, specifies joint a distribution 
for the pairwise counting processes {Nij(t)}. The induced dependence of 
these split counting processes on one another will depend on the nature of 
the probability model for the total number of observations 7Vj.(t); in the 
special case where this model is Poisson, the processes will be independent 
of one another. 



4. Sequential and retrospective analyses. Typically data for a dynamic 
social network will arrive as an online stream. At each discrete time t we 
will have two inferential possibilities. The first is to decide whether the new 
data at t is anomalous compared to the previous data gathered, to which we 
give the term sequential analysis. For sequential analysis at time t, we are 
concerned with the distribution I > r(dNij(t)\7it-i) ■ The second possibility is 
to revise our decisions about all previous periods in light of the new data, to 
which we give the term retrospective analysis. For retrospective analysis at 
time t, we are concerned with the distributions {Pr(dNij (u)\7it / N (u)) : 1 < 
u<t}, where Ht/N(u) represents the history of the processes if their values 
at time u were not observed. 

The difference between sequential and retrospective analyses is most pro- 
nounced for times near the start of the process. In sequential analysis, it 
is unlikely that the earliest time points will be flagged as being anomalous, 
since early on there are very few data points with which to compare the 
current observation. However, in retrospective analysis, we can look back to 
these early time points and now revise our opinion, in light of all that has 
been seen since, as to whether those periods were in fact anomalous. 

Retrospective analysis can be seen as the more thorough inferential tool, 
as it contains sequential analysis as a special case. Sequential analysis alone 
is faster and more immediately relevant. Once the process has been running 
for sufficiently long, subsequent retrospective analyses of a much earlier time 
point should eventually converge in opinion, as should the retrospective and 
sequential analyses for more recent time points. 
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5. Results. 

5.1. EMM. Here we apply our anomaly detection methods to the real 
EMM network data provided by JRC. The weekly counts of the contacts 
made by all individuals found in news website stories relating to terrorist 
attacks, political unrest or security between 1 January 2005 and 11 July 
2007 are shown in the top panel of Figure 1. 

The counting process and compensator for the activity of the whole net- 
work are shown in the second row of Figure 1. These results have been 
obtained from the sequential Dirichlet process model with an uninformative 
negative binomial base measure [using parameter pairs (0.1,0.01) for whole 
network analysis, and later (0.1,0.1) for individual and pairwise analyses; 
see Appendix D of Supplement A, Heard et al. (2010)]; parametric analysis 
with a hurdle Poisson-gamma mixture with the same parameters gives very 
similar results. Unsurprisingly, the compensator is over-predicting activity 
during the known server downtime occurring in the first month, and has 
subsequently under-predicted the total cumulative activity for a long while 
after this experience. Note that the counting process martingale increments 
and their predictable variation (third row of Figure 1) stabilize much earlier 
than this, with the only major departures of the residuals from ±2 standard 
deviations occurring at the corresponding spikes in the count data. These 
points also coincide with the lowest predictive p-values in the fourth row of 
Figure 1. Note that most of the remaining significant p- values (using a 0.05 
threshold) in this graph correspond to highly negative martingale residuals, 
suggestive of further possible server downtimes. 

It is instructive to note that the sequential analysis p-values do not show 
the known server downtime to be significant. This is because we are still 
very much in the learning phase when the server failure occurs, and with 
uninformative prior beliefs the downtime is quite acceptable; rather, it is 
the period immediately after the downtime that is deemed anomalous. In 
contrast, a retrospective analysis (bottom panel of Figure 1) conducted at 
the end of the study correctly shows the downtime to be the anomalous 
period, with very small p-values. 

For the pairwise and individual analyses we simply discard all data before 
the known server downtime. Overall, there are 1814 individuals involved in 
the network through the course of the observation period. The most directly 
connected individual is the president of the United States of America during 
the data collection period, George W. Bush, eventually making connections 
with 179 other nodes. But as mentioned earlier, social network graphs are 
typically sparse and here only 2817 of the possible 1,644,391 node connec- 
tions are ever made. 

Across parametric and nonparametric models, the highest count of anoma- 
lous nodes identified through either individual behavior or pairwise interac- 
tions occurs on the 73rd week after the downtime, ending 28 June 2006 (see 
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2005 2006 2007 

Week 

Fig. 1. Top: The number of contacts made each week by all individuals in the EMM 
data set. 2nd row: The counting process and compensator for the whole network activity 
under the sequential Dirichlet process model. 3rd row: The martingale residuals; the dashed 
lines represent 2 times the square root of the predictable variation of the process, ^th row: 
The sequential analysis predictive p-values for the observed counts; crosses indicate values 
falling below a 0. 05 threshold. Bottom: p-values from retrospective analysis. 



BAYESIAN ANOMALY DETECTION METHODS FOR SOCIAL NETWORKS 11 




J FMAMJ JASONDJ FMAMJ JASONDJ FMAMJ J 
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Fig. 2. The number of active nodes each week (top) and then the number of anomalous 
nodes found each week respectively under two models, the hurdle Poisson-gamma and the 
DP with Poisson-gamma base measure. 

Figure 2). Interestingly, this was a time of continuing violence in the Mid- 
dle East, which a fortnight later would see the beginning of the the 2006 
Lebanon War; it was also a week in which the Sudanese government was in 
disagreement with the United Nations over humanitarian involvement in the 
conflict in Darfur. The extended network from spectral clustering of all com- 
municators in that week is shown in Figure 3, and important figures from 
the stories mentioned can be seen toward the top of the network. Nodes 
or links identified as anomalous, according to the individual or pairwise 
Poisson-gamma analyses respectively, are highlighted in red. 

Taking an example pair of anomalous behaving nodes from the network 
in Figure 3, in Figure 4 we briefly examine the relationship of Ehud Olmert 
and Mahmoud Abbas. Mr. Olmert became Prime Minister of Israel on 4 
January 2006 (this is apparent from his growing profile in the top panel of 
Figure 4), and Mr. Abbas has been President of the Palestinian National 
Authority from 15 January 2005. In the interesting week ending 28 June 
2006, both individuals are showing higher than usual individual activity 
(although not their highest ever, see the top two panels of Figure 4); but 
more significantly, when viewed as a pair (bottom panel of Figure 4) they 
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Fig. 3. Network of all active individuals for the week ending 28 June 2006. Anomalous 
links (under sequential Dirichlet process analysis) and individuals are highlighted in red. 
The nodes and links of this graph are interactive in the online version of this paper. 



are showing a high peak of connectivity in the week ending 28 June 2006 
which is unmatched at any other time over the observation period; during 
that week, they had met in person for the first time since Mr. Olmert had 
taken office, and had agreed to a first official summit. Clearly this behavior 
should be considered anomalous and, thus, they take their place in the larger 
network of interesting nodes in Figure 3. 

5.2. VAST 2008. The simulated call data from the IEEE VAST 2008 
Challenge are recorded in real time, and have sufficient realism that phone 
calls are not seen to be made uniformly throughout the day, rather there are 
peak and off-peak periods. With only ten days of data, to avoid modeling of 
daily cyclical effects and obtain a sufficient number of homogeneous calling 
periods for analysis, we used the histogram of daily phone calls over the 
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Fig. 4. Individual news report frequencies for Ehud Olmert and Mahmoud Abbas, fol- 
lowed by reported contacts between the pair. 



ten days across the whole of the network (Figure 5, top left) to identify the 
broad phone call pattern; the day was then broken up into five subintervals 
of equal call frequencies with respect to the histogram. Thus, we obtain fifty 
relatively homogeneous periods for our analysis (Figure 5, top right). 

From the results of Ye et al. (2008) we should suspect that the major 
anomalous activity begins on the eighth day. This is not apparent from Fig- 
ure 5, which on the bottom row also shows the number of active nodes mak- 
ing or receiving calls in each period and the total minutes called across the 
network in each period. Together, these plots do not reveal any departures 
from normality until the end of the ninth day. Clearly then, from a perspec- 
tive of timely anomaly detection, this anomalous behavior is not going to 
simply coincide with just a change in the overall activity of the network. It 
can be noted here that an improvement offered by our following sequential 
analysis over the methods used by Ye et al. (2008) is that the latter made 
use of the data across all of the ten days to detect the major change in the 
social network; our aim will be to detect the anomaly in (discretized) real 
time. 

For these data it seems appropriate to use the Markov formulation given 
in Equations (3.3) and (3.4); across all individuals in the data set, the mean 
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Fig. 5. Top left: Distribution of phone call start times throughout the day across the 
whole network for the VAST 2008 data set. Top right: Call counts across the whole network 
after days have been split into five homogeneous intervals. Bottom left: Number of active 
nodes in each subinterval. Bottom right: Total call durations across the network in each 
subinterval. 



empirical estimates of 0j. and ipi, would be 0.63 and 0.48 respectively, so an 
individual making calls becomes more likely to continue making calls into 
the next period. Because this data set is quite short, we choose to construct 
empirical priors using overall means and variances across all of the call data 
to get broad insights into typical call volumes and their variability, which 
would hopefully mirror the type of prior knowledge which should be available 
from a domain expert. Similar but not identical results can still be achieved 
with uninformative priors. 

Figure 6(left) shows the number of anomalies we find in each time segment 
under a sequential individual analysis of the call network nodes using a sim- 
ple Bernoulli "on-off " view of node activity but incorporating the Markov 
assumption. There is a clear maximum at the start of the eighth day. All 
eight of the anomalous nodes found are from the list deduced by Ye et al. 
(2008) using all of the data and several combined methodologies. 

A spectral cluster plot made using two components of the symmetric 
Laplacian of the historical adjacency matrix [von Luxburg (2007)] is given 
in Figure 7. The structure is interesting, with six of the anomalous nodes 
appearing together in pairs at extremes of the diagram. As shown in Ye 
et al. (2008), these ID pairs are each actually one individual who switches 
from using one cell phone to another shortly before the anomalous event 
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occurs. Besides these three pairs, we find two of the remaining five individ- 
uals declared significant in the social network by Ye et al. (2008), although 
all are very much toward the center of our filtered graph and so are clearly 
important figures. Having found the major anomalous activity, it is a small 
matter to identify the remaining participants. Caller ID 200, for example, 
is the leader of the social network but is not detected as anomalous by our 
method; however, a simple investigation of the call activity of the set of 
anomalous nodes detected reveals ID 200 to be the most frequent commu- 
nicator in the network with this group, and then the undetected ID 3 is one 
of only six nodes who ever communicate with the network leader; finally, as 
noted in Ye et al. (2008), "the person whose ID is communicated with all 
the important people who communicated with 200." 

To better understand the nature of the anomalous behavior of the cir- 
cled nodes in Figure 7, we monitor their collective call activity and cell- 
tower usage as a group using the multinomial model from Section 3.4 with 
a Poisson-gamma model for the number of calls; the cell towers reveal the 
group's locations on the fictional island each day and so enable us to track 
their movements. The group's p-values from the multinomial model for each 
of the ten days are shown in Figure 6 (right), with the two sets of values 
corresponding to considering the number of calls made as either fixed or 
random. In obtaining these results, a Poissom-gamma(16/9, 2/9) model was 
used for the number of calls so that the expected number of calls equaled 
the number of nodes, and a flat Dirichlet = 1/30 prior used for the multi- 
nomial model; see Supplement A [Heard et al. (2010)] for details. In terms 
of call volumes, the six and seventh days see a big drop in the group's call 
activity from an average of over eight calls per day to just one and two calls 
in total on those two respective days, followed by a surge of call activity 
on the important anomalous eighth day (30 calls) and onward (23 and 27 
calls respectively). From a cell tower perspective, day 2 is found to be fairly 
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Fig. 6. Left: The number of anomalous caller IDs found in each time segment under a 
Markov-Bernoulli model. Right: The p-values of the anomalous ID cell-tower usage under 
the multinomial model; the circular points are the p-values when the number of calls is 
treated as random, the crosses consider the number of calls as a known quantity. 
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Fig. 7. Calls between cell phones during the most anomalous period, occurring at the 
start of the eighth day. Nodes which were identified by Ye et al. (2008) as suspicious are 
colored red; nodes identified as anomalous in the present analysis are circled. 

anomalous as the group moves to using a new cell tower, number 30 (three 
times), whereas day 3 sees a shift in the balance of how the same set of 
towers are used. Besides the high call volumes, the eighth day also sees the 
group use seven hitherto unused cell towers — towers 7, 9, 17, 20, 21, 22 and 
28, and the ninth day sees a first use of towers 2 and 12. The predictive 
p- values drop to near zero on these days. 

6. Discussion. We have presented a simple statistical framework for mon- 
itoring dynamic social networks, by viewing the frequency of connections 
between node pairs as simple counting processes. Bayesian learning of the 



BAYESIAN ANOMALY DETECTION METHODS FOR SOCIAL NETWORKS 17 



distribution of these counts enables predictive p-values to be determined for 
a new observation. Once a collection of interesting nodes have been identi- 
fied in this way, standard network analytical methods can be used to identify 
the anomalous network structure. 

This methodology has been successfully applied to real and simulated 
data sets of moderate size in this paper. Further, it has been remarked that 
because the methodology is mostly parallelizable and the networks are typ- 
ically sparse, scaling to very large networks is feasible. For the data sets 
presented here the analysis is already fast; for example, for the VAST data 
set preprocessed into time series of length 50, identifying anomalous indi- 
vidual activity from the 400 IDs either as callers or through being involved 
in calls took 2.0 seconds. This timing is based on code run on Matlab 7.3.0 
using one core on a 64-bit 1.86 GHz Xenon quad-core machine. 

In both analyses, there was agreement across different count models about 
the peak of anomalous behavior. Spectral clustering was used to identify 
structure in the anomalous subnetwork, which was in agreement with knowl- 
edge from other sources. 

The European Media Monitor data were from a two and a half year col- 
lection period; this could be considered as only a moderate amount of time 
in politics, probably overseeing at most one change in government in any 
represented country, for example. The simulated cell phone data covered a 
very short period, just ten days. The methodology presented is well suited 
to short or medium term modeling, as a global model is fitted across the 
whole time line and anomalies detected with respect to this global model. 

For a longer term view, a global model is not appropriate, as even nor- 
mal behavior between individuals would be expected to evolve. An adaptive 
changepoint model with local models fitted within shorter blocks of time 
provides a natural extension. Such a model would lie comfortably within 
the Bayesian modeling paradigm, although some of the simplicity of com- 
putation enjoyed here would be lost. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Hurdle exponential family distributions 

(DOI: 10.1214/10-AOAS329SUPPA; .pdf). Details of the Bayesian inferen- 
tial models considered in this paper. 
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Supplement B: Mat lab /Octave code (DOI: 10. 1214/10- AOAS329SUPPB; 
.zip). Matlab code written by DJW for implementing the models used in this 
paper. 
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